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Abstract 

We propose a new method for the estimation of parameters of hidden diffusion 
processes. Based on parametrization of the transition matrix, the Baum- Welch al- 
gorithm is improved. The algorithm is compared to the particle filter in application 
to the noisy periodic systems. It is shown that the modified Baum- Welch algorithm 
is capable of estimating the system parameters with better accuracy than particle 
filters. 
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1 Introduction 



Identification or parameter estimation is one of the most important and inter- 
esting fields in the nonlinear dynamics and time series analysis. There exist 
many methods of identifying the parameters of a nonlinear stochastic system 
such as maximum Likelihood estimators and Bayes estimators [1]. Especially 
this latter is related to the Sequential Monte Carlo methods [2] which are 
also known as Particle filter methods introduced by Gordon et al. [3]. These 
methods utilize a large number of random samples (or particles) to represent 
the posterior probability distributions. The particles are propagated over time 
using a combination of sequential importance sampling and resampling steps. 
At each time step the resampling procedure statistically multiplies and/or dis- 
cards particles to adaptively concentrate particles in regions of high posterior 
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probability. 

Particle filter methods are usually applied to state space models to approx- 
imate the posterior probability distributions of the state given the available 
observation. If the state space models contain a set of unknown parameters 
which are to be estimated then one can include them in the model by aug- 
menting the state vector [4]. 

As an alternative method for estimating parameters of continuous hidden dif- 
fusion processes we propose a hidden Markov models (HMMs) [5] approach 
which model both the signal and noise simultaneously [6]. This is based on 
the approximation of continuous systems by discrete models. The underlying 
signal is assumed to be generated by a discrete Markov chain. The latter uses 
the joint probability of the sequence of the discrete observation samples as the 
likelihood function. The general theory of HMMs was established by Baum et 
al. in the sixties [7,8,9]. 

Standard HMMs rely on the Baum- Welch reestimation procedure to optimize 
the likelihood function [5]. The standard Baum- Welch algorithm suffers from 
the problem that it may converge to a local minimum. However, we can over- 
come this difficulty by parameterizing the transition matrix [10]. 
Previous works have shown that hidden Markov models are successful tools for 
modeling and classifying dynamic behaviors. For example, HMMs are used for 
analyzing biological sequences [11], speech recognition [12], ion channel anal- 
ysis [13,14,15], and to detect different modes of neuronal activity [16]. 
In experimental physics, the objective of any measurement is to determine the 
value of the particular quantity to be measured. In general, however, the re- 
sult of a measurement is only an approximation or estimate of the value one is 
looking for. For instance in coupled Josephson junctions [17], direct measure- 
ments of the time dependence of the voltage are usually impossible because 
the characteristic time scale of voltage variations is too short (~ picoseconds). 
One can usually measure the Josephson radiation emission in some narrow fre- 
quency range, which can show chaotic behaviour. But in this case one cannot 
see higher harmonics which are required to fully reconstruct the voltage time 
evolution. In experiments, another version of the voltage is usually measured 
which is the results of the low pass filtering. The obtained voltage is used to 
extract the current- volt age characteristic. In this case, the observed variable is 
the voltage whereas the Josephson phase is hidden. Likharev [17] reported that 
the coupled Josephson junctions belong to a class of complex systems. The 
corresponding experimental works show that it is difficult to estimate some of 
the parameters characterizing the Josephson device, e.g. the damping related 
to the fluctuation of the temperature and the maximal Josephson current. 
For the case that the measured time series proved to be approximately Marko- 
vian, Friedrich et al [18,19] proposed an approach to obtain the drift and diffu- 
sion of one-dimensional Langevin equations from the time series. This is based 
on the finite-difference form of their definition together with suitable interpo- 
lations of the resulting trends. Ragwitz et al. [20] proposed a correction of this 
approach to reduce the errors due to a finite times step. This was controversy 
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for the case of directly observed states of continuous diffusion processes, which 
were measured in discrete time [20], [19], [21]. We believe that our approach 
can then be used to clarify this situation. 

This paper is devoted to a numerical evaluation of these two methods by ap- 
plying them, for instance, to the problem of diffusion in periodic potentials 
with noisy observations. The latter example is taken as a periodic function of 
the coordinate of the diffusing state. The aim of this work is to estimate the 
drift coefficient and the diffusion constant. 

The paper is organized as follows. In the next section, we formulate the prob- 
lem of hidden diffusion processes. In section 3, we review the particle filter 
and propose a modified Baum- Welch algorithm. Section 4 is devoted to a nu- 
merical simulation and the evaluation of both methods. Conclusions are given 
in the last section. 



2 Mathematical model 

Diffusion processes are usually modeled by the evolution equation of the prob- 
ability density function V which is governed by the continuous Fokker-Planck 
equation. This can be read in one dimension 

y>(*> t) = ~§- x [F{x)V{x, t)] + ~ [D(x)V(x, t)] , (1) 

where F is the drift and D is the diffusion coefficient. The process of Eq. (1) 
can equivalently be described by a Langevin equation interpreted in It 6 sense 

x = F(x) + ^D(xjv(t) , (2) 

where v(t) is intrinsic white noise with the density q(v t ), and the initial con- 
dition of Eq. (2) is given by x(t = 0) ~ /i(x(0)). 

The state variable x(t) can usually not be observed directly but only via a 
measurement process, which is modeled by an observation function h as 

y(t) = h(x(t),w(t)) , (3) 

where w(t) denotes observation noise with the density r(w t ) which is indepen- 
dent of v(t). 

Eq. (2) together with Eq. (3) define hidden diffusion processes. 
In general, the observation function h is nonlinear. Thus, the diffusing state 
x is hidden. The estimation problem therefore becomes difficult to tackle and 
there exists no analytical method dealing with diffusion coefficient estimation. 



3 



There exist only approximate numerical methods such as particle filters. In 
practice, the particle filters are applied to discretized version of the system 
(2-3), which result in the state space model 



where G t and H t are assumed to be known nonlinear functions, the dynamical 
white noise v t and the observation white noise w t are independent random 
processes and the initial condition x ~ p(x ). 

In the next section, we give a short overview of the particle filters and the cor- 
responding implementation issues (more details can be found in [3,2,22,23]). 
We assume that the diffusion coefficient D(x) = D is constant throughout the 
paper. 



3 Algorithms 

3.1 Particle filter algorithm (Monte Carlo Filter) 

Consider systems that are described by the generic state space model (4-5). 
Sequential Monte Carlo methods or particle filters provide an approximate 
Bayesian solution to the discrete time recursive of the state space model (4-5) 
by updating an approximate description of the posterior filtering density. 
Let x t denote the state of the observed system and y t = {j/j}* =1 the set of 
observations up to the present time t. Let the independent process noise v t 
and the measurement noise Wt with the densities q(vt) respective r(wt). The 
initial uncertainty is described by the density p(xo). The particle filter ap- 
proximates the probability density p(x t \y t ) by using a large set of N p particles 
{xi^}^, where each particle has an assigned relative weight mf \ such that 
all weights sum to one. The particle filter updates the particle location and 
the corresponding weights recursively with each new observation. The non- 
linear prediction density p(x t \y t -i) and optimal filtering density p(x t \y t ) for 
the Bayesian interference are given by 



x t = Gt(xt-i,v t ) 
y t = H t (x u w t ) , 



(4) 
(5) 



p(xt\yt-i)= / p{xt\x t - 1 )p(x t - 1 \y t -i)Axt. 




-1 



(6) 



(7) 
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where p(yt\yt-i) — I p{yt\xt)p{xt |iVt-i)d^t- The transition probability density 
p(xt\xt-i) is know as the motion model (4) and p(x t _i|[Vt-i) is the updated 
estimate from the previous step. p(yt\x t ) is the observation probability density 
given by Eq. (5). 

Note that, generally, these equations are not analytically tractable. However, 
for the important special case of linear dynamics, linear measurements and 
Gaussian noise there exist a closed form solution of Eq.(6-7) , given by the 
Kalman filter [24]. 

The main idea of the optimal filter is to approximate p(x t \y t -i) with 
i n p 

p(xt\y t -i)* w Y, s ( x t-4 l) ), (8) 
iv ?> i=i 



where 5 is the Dirac delta distribution. 

Inserting (8) into (7) yields a density of a simple form. This can be done 
by using the Bayesian bootstrap or Sampling Importance Resampling (SIR) 
algorithm from [2] which is given by the following algorithm 



(1) At t — 0, generate random numbers Xq ~ p(xo) for j = 1, N p 

(2) Repeat the following steps for t — 1, ...,T 

(a) Generate random numbers ~ q(v ) for j = 1, N p 

(b) Compute p[ j) = G{xi\, v\ j) ) for j = 1, N p 

(c) Compute mj j) = p(y t \Pt ] ) for j = 1, N p 

(d) Resample with replacement N p particles {x^} from {p{^} according to 
the importance weights 

Table 1: Particle filter algorithm 



Note that the resampling procedure (step (2d) in the Table 1) selects only the 
fittest particles to obtain an unweighted measure {{x^\ -^-}. 
Sometimes the resampling step is omitted and simply imposed when needed 
to avoid a divergence in the particle filter as in the sequential importance 
sampling (SIS) method, where the weight is updated recursively as [23] 

= m t-i ■ p{yt\Pt ] ) for j = 1, N p 

As the estimate of the state we choose the minimum mean square estimate, 
i.e. 

x t = x t p(x t \y t )dx t ?s ^m[ l) x[ l) . (9) 
J i=i 
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where = / rnf 1 . 



Parameter estimation: the state space model (4-5) usually contains several 
unknown parameters, such as the variances of the noises and the coefficients 
of the functions F t and H t . Let us denote such unknown parameters by 6 = 
(60, ■ ■ ■ , @ Nq ) • We consider a Bayesian estimation problem by augmenting the 
state vector x t with the unknown parameter vector 6 as 



z t = 



x t 
0t 



(10) 



with 9 t = 0. The state space model for this augmented state vector z t is thus 

zt = Gt(zt-i,vt) 
y t = HZ(z t ,w t ) , 



where the nonlinear functions G^(z, v) = (G t {x, v), 9 t ) and H*(z, w) = H t (x, w). 
We can therefore apply the particle filter algorithm to the state space models 
described by Eq. (11) as previously. 



3.2 EMM and Modified Baum- Welch algorithm 



The approximation of continuous hidden diffusion processes (2)-(3) by discrete 
models results in Hidden Markov Models (HMMs). In [10] the diffusion process 
was approximated by a discrete random walk with N states with only nearest 
neighbor transitions. For the observation process we considered an appropriate 
discrete process, which is well defined in the continuous time limit, e.g. consider 
Eq. (3) in discrete form. 

Comparing Fokker-Planck equations on the one hand with discrete time and 
space master equations on the other hand, it is easy to establish the connection 
between the continuous diffusion process and the Markov model parameters 
as in [10] 



j-,/. A \ a i,i+l a i,i—l n 

( ) = Ax 

D(i Ax) = [(a M+ i + a M _i) - (a M+i - a^-i) 2 ]/^ , 



(12) 
(13) 
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where D = and the elements of the transition matrix. The con- 

tinuum limit Ax, At — > can be approached by keeping Dq constant. 
Relations (12-13) are important, because they give a justification for the ap- 
proximation of continuous diffusion processes by discrete models. 
Standard HMMs rely on standard Baum- Welch reestimation procedure to op- 
timize the likelihood function (more details can be found in [10]). The proce- 
dure may have several drawbacks if it is applied to the problem of diffusion 
in periodic potentials with noisy observations. Since the observation function 
was simply chosen as the cosine of the state variable, the maxima of likeli- 
hood function are degenerate e.g. each state is observed with two different 
observations. More importantly, in order to converge to the continuous hid- 
den diffusion processes, we should choose a large number of states, which 
means that many parameters should be re-estimated. In this case, the stan- 
dard Baum- Welch algorithm is not applicable, due to the limited number of 
observations. To avoid these problems we have to use a modified version of 
the Baum- Welch algorithm . It consists in parameterizing the matrix of the 
transition probability. For instance, this can be done by a Fourier expansion 
of the elements of the transition matrix. 



{ak,k+j} = {ak,k+j(d)} = J2 °n(j)exp \i 2n — ) for j = -1,1, (14) 



Following [25] we obtain N e nonlinear implicit equations 



da l3 {9^) ( K-(flW)}) VuQh {au(9M)}) \ . . 



for k — 1, • • • , Nq. 

The calculation of the conditional probability ^ij(y t , {a.ij(9^)}) at iteration 
n can be carried out by using the forward-backward algorithm given in [5]. 
In case of homogeneous random walks we derived in [10] an explicit expression 
for the new estimates of the parameterized transition probabilities in terms 
of previous estimates and the observed signal. In general, Eq. (15) has to be 
solved numerically, for example, by using the Newton methods. Then one can 
find the fixed point 6* solution of Eq. (15). 

Given a set of observation data y t = {y}* =1 , the core of the modified Baum- 
Welch algorithm reads 
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(1) Generate a hidden Markov model "trainer" with N states and M distinct 
observation symbols 

(a) Generate a tridiagonal transition matrix according to Eq. (14) 

(b) The observation matrix is given by the time discretization of Eq. (3) 

(2) Repeat until \logP(y\a i:j (9^) - logP^a^"- 1 ))) < 1(T 6 

(a) Compute the conditional probability ^ij(y t , {^(0^)}) using the 
forward-backward algorithm given in [5] 

(b) Update the elements of the transition matrix using the formula (15) 



Table 2: Combined HMM and modified Baum- Welch algorithm 

An obvious advantage of the modified Baum- Welch algorithm is that it is 
independent of the number of states N but depends only on the number of 
Fourier coefficients. 



4 Simulation Results 



We now present a simple example to illustrate the central ideas in this paper. 
We consider the system 



X t = (e + £ 9 n sm(2nnX t /L) j +VDv t (16) 

Y t = (co8(2nX t /L)) + y/jw t , (17) 

with the initial condition X ~ A/"(0, 1). The driving noise v t and the obser- 
vation noise Wt are independent Gaussian random processes of variance one. 
In Eq.(16-17), L is the spatial extension (period) equal to NAx ( N is the 
number of states of the discrete model) and 9 = (6q, ■ ■ • , 6^ , D) is the set of 
parameters to be estimated. Eq. (17) describes the observation processes. 
For a practical implementation of the particle filter and the modified Baum- 
Welch algorithm, the necessary sample paths and stochastic integrals must 
be discretely approximated. Appropriate numerical methods are discussed by 
Kloden and Platen [26]. The Euler scheme is used here for this aim. 
Once the observation sequence is generated by the model (16-17), we apply 
the two algorithms to reestimate the drift term and the diffusion constant. 
Note that the application of particle filters in estimating parameters requires 
regarding the set of parameters 6 as time dependent. That is, we have to con- 
sider a different model in which is replaced by 9 t at time t, and to include t 
in the augmented state vector. Then we add an independent, zero-mean nor- 
mal increment to the parameters at each time step. As a result, the discretized 
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equations of system (16-17) read 



x l = x*- 1 + (el 1 + Znli Ol 1 sin(2mrX t /L)) At + 6^ +1 v t 
O t n = t n 1 + < for n = 0,l,---,N e + l 
y t = cos{2'kx 1 /L) + y/aw t , 



(18) 



where 

(?N g +i) = D t, v t ~ Af(0, y/Al), u* ~ jV(0, ev^At) and ^ ~ jV(0, 1). 



For simplicity, we restrict ourselves to the case N g = 1 and At = 1. 
In order to compare the numerical results given by the particle filter with the 
modified Baum- Welch algorithm we consider the drift parameters 6 = —0.1, 
6*i = 0.1, the diffusion constant D = 0.8 and we assume that there is no 
observation noise a — 0. 

First, the particle filter is applied to the entire augmented state vector, using 
the scheme of Table 1. The initial value and the initial covariance of the 
estimated augmented state vector (18) we were set to 






V / 



(h 2 \ 

.l 2 

.l 2 

.l 2 



(19) 



The actual initial value of the state vector was drawn randomly from jV(0, P°). 
Fig. 1 shows the true state x, the parameters 6q, 6i and the diffusion coeffi- 
cient D as a function of time and represent it as black solid lines. The values 
estimated from N p = 1000 are shown by red solid lines. After convergence, the 
particle filter gives a "reasonable" estimation for the state and better estimate 
of the correct values of the drift parameters (9 , 9i) and the diffusion constant 
D. However, the estimate state x does not totally agree with the true values. 
Note in Fig. 1 the stochastic character of the particle filter (because it is based 
on Monte Carlo methods). Fig. 2 presents the estimated filtering distributions. 
One can clearly see from this figure the multimodal non-Gaussian posterior 
distribution character. 

Moreover, Tables 3 and 4 show the performance of the particle filters as func- 
tion of the number of particles for two lengths of observation, T = 100 and 
T = 1000. More specifically, each table shows how many runs out of a total of 
100 simulations diverged. 
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1000 



200 400 600 800 1000 200 400 600 800 1000 

t t 

Fig. 1. State x (top left panel), parameter 6q (top right panel), parameter 6\ (bottom 
left panel) and diffusion coefficient D (bottom right panel) vs. time. The correct 
values are shown by solid black lines and the estimated values after applying the 
particle filter algorithm are represented by solid red lines. 



T=100 



Number of particles 


100 


500 


1000 


#0 


92% 


73% 


43% 


Ox 


88% 


74% 


45 % 


D 


91% 


76% 


48 % 



Table 3: Percentage of diverged runs of the estimated parameters for the particle 

filter. 



One clearly sees from tables that it takes many particles and a large number 
of iterations for the particle filter to work well. The main reason for this is 
well known the degeneracy of particle filter if the process noise has a small 
variance [23]. 
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Fig. 2. Probability density function given by the particle filter algorithm 



T=1000 



Number of particles 


100 


500 


1000 




89% 


25% 


5% 


di 


86% 


25% 


6% 


D 


90% 


69% 


10% 



Table 4: Percentage of diverged runs of the estimated parameters for the particle 

filter. 

In order to use a discrete HMM, we must first quantize the observation data 
into a set of standard vectors according to Elliott [6] . The quantized data are 
used as training sets for a HMM which has to learn the correct parameters 
from these observations. 

Here, we have implemented the modified Baum- Welch algorithm described 
in Table 2. More details on implementation issues can be found in [10]. 

Fig. 3 shows the drift function and diffusion constant as a function of the 
coordinate x. The estimated values are represented as dot-dashed lines after 
applying the particle filter and as dashed lines for the modified Baum- Welch 
algorithm. 

One can see from this figure that a convergence of the modified Baum- Welch 
algorithm to the correct parameter values was obtained. Moreover, the conver- 
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Fig. 3. Drift F(x) (left panel) and diffusion coefficient D (right panel) vs. the state. 
The correct parameters for the drift are 6q = —0.1 and 6\ = 0.1, and for the diffusion 
D = 0.8 which are shown by solid lines. The estimated values given by the modified 
Baum- Welch algorithm are shown by dashed lines and the estimated values given 
by particle filter are represented by dot-dashed lines. 

gence is very fast, n ~ 15 (n is the number of iterations), whereas the particle 
filter algorithm needs a large number of particles, N p = 1000, and needs larger 
number of iterations (around 500) until convergence is obtained. Therefore, 
the time consuming is more relevant for the particle filter algorithm than for 
the modified Baum- Welch algorithm. 



Note in Fig. 3 that using the modified Baum- Welch algorithm, the estimation 
of the drift function is better in the interval x < 5, whereas, it is better in the 
domain x > 5 for the particle filter. This is the inverse situation if we choose 
another initial condition. 



5 Conclusion 



In this paper, we have proposed a modified Baum- Welch algorithm based on 
a parametrization of the transition matrix associated with HMMs. This al- 
gorithm has been compared to particle filters with the aim to reestimate the 
parameters of hidden diffusion processes in periodic potentials and, more pre- 
cisely, to estimate the drift coefficient and the diffusion constant of periodic 
stochastic systems. 

Our simulations show the following results: The particle filter algorithm, where 
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the number of samples and the length of observation are chosen to be large 
(N p = 1000, T = 1000), converges quantitatively to the correct values of the 
drift and diffusion coefficients. The great advantage of the particle filter algo- 
rithm is its enormous flexibility. It can be applied to practically all nonlinear 
and/or non-Gaussian high-dimensional state space models within a statistical 
framework. This algorithm, however, is stochastic in nature (based on Monte 
Carlo) and, it requires a relatively large number of samples to ensure a fair 
maximum likelihood estimate of the current state. In contrast, the modified 
Baum- Welch algorithm is deterministic and the transition probabilities be- 
tween the hidden states are constrained by the parametrization. The modified 
Baum- Welch algorithm converges to the correct results within 20-30 iterations 
of the reestimation procedure. Thus, the basic idea of this paper works well 
and the performance in large iV (continuum limit) can also be evaluated also 
for more complicated situations. 
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